The CpG sites or CG sites are regions of DNA where a cytosine nucleotide is followed by a guanine nucleotide in the linear DNA sequence of Base pair along its 5' → 3' direction. CpG sites occur with high frequency in genomic regions called CpG islands.
Cytosines in CpG dinucleotides can be DNA methylation to form 5-methylcytosines. that add a methyl group are called DNA methyltransferases. In mammals, 70% to 80% of CpG cytosines are methylated. Methylating the cytosine within a gene can change its expression, a mechanism that is part of a larger field of science studying gene regulation that is called epigenetics. Methylated cytosines often mutate to Thymine.
In humans, about 70% of promoters located near the transcription start site of a gene (proximal promoters) contain a CpG island.
This underrepresentation is a consequence of the high mutation rate of methylated CpG sites: the spontaneously occurring deamination of a methylated cytosine results in a thymine, and the resulting G:T mismatched bases are often improperly resolved to A:T; whereas the deamination of unmethylated cytosine results in a uracil, which as a foreign base is quickly replaced by a cytosine by the base excision repair mechanism. The C to T transition rate at methylated CpG sites is ~10 fold higher than at unmethylated sites.
+ | |
Distribution of CpG sites (left: in red) and GpC sites (right: in green) in the human APRT gene. CpG are more abundant in the upstream region of the gene, where they form a CpG island, whereas GpC are more evenly distributed. The 5 of the APRT gene are indicated (blue), and the start (ATG) and stop (TGA) codons are emphasized (bold blue). |
CpG dinucleotides frequently occur in CpG islands (see definition of CpG islands, below). There are 28,890 CpG islands in the human genome, (50,267 if one includes CpG islands in repeat sequences). This is in agreement with the 28,519 CpG islands found by Craig Venter et al. since the Venter et al. genome sequence did not include the interiors of highly similar repetitive elements and the extremely dense repeat regions near the centromeres. Since CpG islands contain multiple CpG dinucleotide sequences, there appear to be more than 20 million CpG dinucleotides in the human genome.
Many genes in mammalian genomes have CpG islands associated with the start of the gene (promoter regions). Because of this, the presence of a CpG island is used to help in the prediction and annotation of genes.
In mammalian genomes, CpG islands are typically 300–3,000 base pairs in length, and have been found in or near approximately 40% of promoters of mammalian genes. Over 60% of human genes and almost all house-keeping genes have their promoters embedded in CpG islands. Given the frequency of GC two-nucleotide sequences, the number of CpG dinucleotides is much lower than would be expected.
A 2002 study revised the rules of CpG island prediction to exclude other GC-rich genomic sequences such as Alu sequence. Based on an extensive search on the complete sequences of human chromosomes 21 and 22, DNA regions greater than 500 bp were found more likely to be the "true" CpG islands associated with the 5' regions of genes if they had a GC content greater than 55%, and an observed-to-expected CpG ratio of 65%.
CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (~4–6%), whereas the rest of the genome has much lower CpG frequency (~1%), a phenomenon called CG suppression. Unlike CpG sites in the coding region of a gene, in most instances the CpG sites in the CpG islands of promoters are unmethylated if the genes are expressed. This observation led to the speculation that methylation of CpG sites in the promoter of a gene may inhibit gene expression. Methylation, along with histone modification, is central to imprinting. Most of the methylation differences between tissues, or between normal and cancer samples, occur a short distance from the CpG islands (at "CpG island shores") rather than in the islands themselves.
CpG islands typically occur at or near the transcription start site of genes, particularly housekeeping genes, in vertebrates. A C (cytosine) base followed immediately by a G (guanine) base (a CpG) is rare in vertebrate DNA because the cytosines in such an arrangement tend to be methylated. This methylation helps distinguish the newly synthesized DNA strand from the parent strand, which aids in the final stages of DNA proofreading after duplication. However, over time methylated cytosines tend to turn into because of spontaneous deamination. There is a special enzyme in humans (Thymine-DNA glycosylase, or TDG) that specifically replaces T's from T/G mismatches. However, due to the rarity of CpGs, it is theorised to be insufficiently effective in preventing a possibly rapid mutation of the dinucleotides. The existence of CpG islands is usually explained by the existence of selective forces for relatively high CpG content, or low levels of methylation in that genomic area, perhaps having to do with the regulation of gene expression. A 2011 study showed that most CpG islands are a result of non-selective forces.
Distal promoter elements also frequently contain CpG islands. An example is the DNA repair gene ERCC1, where the CpG island-containing element is located about 5,400 nucleotides upstream of the transcription start site of the ERCC1 gene. CpG islands also occur frequently in promoters for functional noncoding RNAs such as .
One 2012 study listed 147 specific genes with colon cancer-associated hypermethylated promoters, along with the frequency with which these hypermethylations were found in colon cancers. At least 10 of those genes had hypermethylated promoters in nearly 100% of colon cancers. They also indicated 11 whose promoters were hypermethylated in colon cancers at frequencies between 50% and 100% of cancers. MicroRNAs (miRNAs) are small endogenous RNAs that pair with sequences in to direct post-transcriptional repression. On average, each microRNA represses several hundred target genes. Thus microRNAs with hypermethylated promoters may be allowing over-expression of hundreds to thousands of genes in a cancer.
The information above shows that, in cancers, promoter CpG hyper/hypo-methylation of genes and of microRNAs causes loss of expression (or sometimes increased expression) of far more genes than does mutation.
On the other hand, the promoters of two genes, PARP1 and FEN1, were hypomethylated and these genes were over-expressed in numerous cancers. PARP1 and FEN1 are essential genes in the error-prone and mutagenic DNA repair pathway microhomology-mediated end joining. If this pathway is over-expressed the excess mutations it causes can lead to cancer. PARP1 is over-expressed in tyrosine kinase-activated leukemias, in neuroblastoma, in testicular and other germ cell tumors, and in Ewing's sarcoma, FEN1 is over-expressed in the majority of cancers of the breast, prostate, stomach, neuroblastomas, pancreatic, and lung.
DNA damage appears to be the primary underlying cause of cancer.
As reviewed by Duke et al., neuron DNA methylation (repressing expression of particular genes) is altered by neuronal activity. Neuron DNA methylation is required for synaptic plasticity; is modified by experiences; and active DNA methylation and demethylation is required for memory formation and maintenance.
In 2016 Halder et al. using mice, and in 2017 Duke et al. using rats, subjected the rodents to contextual fear conditioning, causing an especially strong long-term memory to form. At 24 hours after the conditioning, in the hippocampus brain region of rats, the expression of 1,048 genes was down-regulated (usually associated with 5mCpG in gene promoters) and the expression of 564 genes was up-regulated (often associated with hypomethylation of CpG sites in gene promoters). At 24 hours after training, 9.2% of the genes in the rat genome of hippocampus neurons were differentially methylated. However while the hippocampus is essential for learning new information it does not store information itself. In the mouse experiments of Halder, 1,206 differentially methylated genes were seen in the hippocampus one hour after contextual fear conditioning but these altered methylations were reversed and not seen after four weeks. In contrast with the absence of long-term CpG methylation changes in the hippocampus, substantial differential CpG methylation could be detected in cortical neurons during memory maintenance. There were 1,223 differentially methylated genes in the anterior cingulate cortex of mice four weeks after contextual fear conditioning.
As reviewed in 2018, in brain neurons, 5mC is oxidized by the ten-eleven translocation (TET) family of dioxygenases (TET1, TET2, TET3) to generate 5-hydroxymethylcytosine (5hmC). In successive steps TET enzymes further hydroxylate 5hmC to generate 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC). Thymine-DNA glycosylase (TDG) recognizes the intermediate bases 5fC and 5caC and excises the glycosidic bond resulting in an apyrimidinic site (AP site). In an alternative oxidative deamination pathway, 5hmC can be oxidatively deaminated by activity-induced cytidine deaminase/apolipoprotein B mRNA editing complex (AID/APOBEC) deaminases to form 5-hydroxymethyluracil (5hmU) or 5mC can be converted to thymine (Thy). 5hmU can be cleaved by TDG, single-strand-selective monofunctional uracil-DNA glycosylase 1 (SMUG1), Nei-Like DNA Glycosylase 1 (NEIL1), or methyl-CpG binding protein 4 (MBD4). AP sites and T:G mismatches are then repaired by base excision repair (BER) enzymes to yield cytosine (Cyt).
Two reviews summarize the large body of evidence for the critical and essential role of ROS in memory formation. The DNA demethylation of thousands of CpG sites during memory formation depends on initiation by ROS. In 2016, Zhou et al., showed that ROS have a central role in DNA demethylation.
TET1 is a key enzyme involved in demethylating 5mCpG. However, TET1 is only able to act on 5mCpG if an ROS has first acted on the guanine to form 8-hydroxy-2'-deoxyguanosine (8-OHdG), resulting in a 5mCp-8-OHdG dinucleotide (see first figure in this section). After formation of 5mCp-8-OHdG, the base excision repair enzyme OGG1 binds to the 8-OHdG lesion without immediate excision. Adherence of OGG1 to the 5mCp-8-OHdG site recruits TET1, allowing TET1 to oxidize the 5mC adjacent to 8-OHdG, as shown in the first figure in this section. This initiates the demethylation pathway shown in the second figure in this section.
Altered protein expression in neurons, controlled by ROS-dependent demethylation of CpG sites in gene promoters within neuron DNA, is central to memory formation.
CpG loss
Genome size and CpG ratio are negatively correlated
Alu elements as promoters of CpG loss
See also
|
|